Sub-linear Queries Statistical Databases: Privacy with Power

نویسنده

  • Cynthia Dwork
چکیده

We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true response is ∑ r∈S f(DBr), a noisy version of which is released. Results in [3, 4] show that a strong form of privacy can be maintained using a surprisingly small amount of noise, provided the total number of queries is sublinear in the number n of database rows. We call this a sub-linear queries (SuLQ) database. The assumption of sublinearity becomes reasonable as databases grow increasingly large. The SuLQ primitive – query and noisy reply – gives rise to a calculus of noisy computation. After reviewing some results of [4] on multi-attribute SuLQ, we illustrate the power of the SuLQ primitive with three examples [2]: principal component analysis, k means clustering, and learning in the statistical queries learning model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy-Preserving Datamining on Vertically Partitioned Databases

In a recent paper Dinur and Nissim considered a statistical database in which a trusted database administrator monitors queries and introduces noise to the responses with the goal of maintaining data privacy [5]. Under a rigorous definition of breach of privacy, Dinur and Nissim proved that unless the total number of queries is sub-linear in the size of the database, a substantial amount of noi...

متن کامل

A Method for Protecting Access Pattern in Outsourced Data

Protecting the information access pattern, which means preventing the disclosure of data and structural details of databases, is very important in working with data, especially in the cases of outsourced databases and databases with Internet access. The protection of the information access pattern indicates that mere data confidentiality is not sufficient and the privacy of queries and accesses...

متن کامل

Sensitivity of Counting Queries

In the context of statistical databases, the release of accurate statistical information about the collected data often puts at risk the privacy of the individual contributors. The goal of differential privacy is to maximise the utility of a query while protecting the individual records in the database. A natural way to achieve differential privacy is to add statistical noise to the result of t...

متن کامل

Broadening the Scope of Differential Privacy Using Metrics

Differential Privacy is one of the most prominent frameworks used to deal with disclosure prevention in statistical databases. It provides a formal privacy guarantee, ensuring that sensitive information relative to individuals cannot be easily inferred by disclosing answers to aggregate queries. If two databases are adjacent, i.e. differ only for an individual, then the query should not allow t...

متن کامل

The Trade-off between Privacy and Fidelity via Ehrhart Theory

As an increasing amount of data is gathered nowadays and stored in databases, the question arises of how to protect the privacy of individual records in a database even while providing accurate answers to queries on the database. Differential Privacy (DP) has gained acceptance as a framework to quantify vulnerability of algorithms to privacy breaches. We consider the problem of how to sanitize ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005